Rapid rule-based machine translation between Dutch and Afrikaans
نویسندگان
چکیده
This paper describes the design, development and evaluation of a machine translation system between Dutch and Afrikaans developed over a period of around a month and a half. The system relies heavily on the re-use of existing publically available resources such as Wiktionary, Wikipedia and the Apertium machine translation platform. A method of translating compound words between the languages by means of left-to-right longest match lookup is also introduced and evaluated.
منابع مشابه
Cross-Lingual Genre Classification for Closely Related Languages
Resource-scarcity is a topic that is continually researched by the HLT community, especially for the SouthAfrican context. We explore the possibility of leveraging existing resources to help facilitate the development of new resources for under-resourced languages by using cross-lingual classification methods. We investigate the application of an Afrikaans genre classification system on Dutch t...
متن کاملThe Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Compounding is very productive and leads to practi...
متن کاملClassification of Noun-Noun Compound Semantics in Dutch and Afrikaans
This article presents initial results on a supervised machine learning approach to determine the semantics of noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine method used for this classification experiment utili...
متن کاملExploring unsupervised word segmentation for machine translation in the South African context
We explore the application of unsupervised word segmentation algorithms to phrase-based statistical machine translation (SMT) systems, translating from English to four South African languages: Afrikaans, Northern Sotho, Tsonga and Zulu. Positive results in terms of the standard BLEU and NIST scores are obtained for systems translating into Afrikaans and Zulu.
متن کاملSyntactic Reordering as Pre-processing Step in Statistical Machine Translation of English to Sesotho sa Leboa and Afrikaans
The output quality of statistical machine translation (SMT) depends to a large extent on the quantity and quality of the parallel corpora on which it is trained. In the case of resource-scarce languages where sufficiently large parallel corpora are not always available, alternative ways of improving the output quality of SMT systems must be sought. In this article, one such a method for improvi...
متن کامل